Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation

نویسندگان

  • Paolo Rosso
  • Francisco M. Rangel Pardo
  • Martin Potthast
  • Efstathios Stamatatos
  • Michael Tschuggnall
  • Benno Stein
چکیده

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre perspective; and (iii) author obfuscation, addressing author masking and obfuscation evaluation. In total, 35 teams participated in all three shared tasks of PAN 2016 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of the PAN/CLEF 2015 Evaluation Lab

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...

متن کامل

Clustering by Authorship Within and Across Documents

The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estim...

متن کامل

An Overview of the Traditional Authorship Attribution Subtask

This paper describes the Traditional Authorship Attribution subtask of the PAN/CLEF 2012 workshop. As a followup to our subtask at PAN/CLEF 2011 (Amsterdam), we established a new corpus for analysis for 2012 (Rome). The new corpus differed in several ways from the previous subtask: – Both the number and size of documents were decreased – The documents were taken from a different genre (fiction,...

متن کامل

Multi Feature Space Combination for Authorship Clustering

The Author Identification task for PAN 2016 consisted of three different Sub-tasks: authorship clustering, authorship links and author diarization. We developed a machine learning approaches for two of three of these tasks. For the two authorship related tasks we created various sets of feature spaces. The challenge was to combine these feature spaces to enable the machine learning algorithms t...

متن کامل

Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited

We report on the second large-scale evaluation of style obfuscation approaches in a shared task on author obfuscation, organized at the PAN 2017 lab on digital text forensics. Author obfuscation means to automatically paraphrase a given text such that state-of-the-art authorship verification approaches misjudge a given pair of documents as having been written by “different authors” if in fact t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016